Decimal quantization#494
Decimal quantization#494etanol merged 1 commit intopython-babel:masterfrom kdeldycke:decimal-quantization
Conversation
Codecov Report
@@ Coverage Diff @@
## master #494 +/- ##
==========================================
+ Coverage 90.04% 90.06% +0.01%
==========================================
Files 24 24
Lines 4039 4046 +7
==========================================
+ Hits 3637 3644 +7
Misses 402 402
Continue to review full report at Codecov.
|
|
This PR was previously based on #491 and was partially duplicating it. Now that the later was merged, I rebased and squashed the whole code on top of the current This PR is ready to be reviewed and merged. |
benselme
left a comment
There was a problem hiding this comment.
- Better docs needed
- Improved commit message also needed
Not easy to review, but I think I have found some bugs using some data from ICU4J. The '#E0' pattern which is used in many locales for scientific notation should not truncate the fractional part.
Ex: for the 'es' locale: ICU gives "1,23445678E3", while babel gives "1E3".
I checked with other libraries, and the Elixir CLDR one gives the same result as ICU. I think it's a special case but I cannot find it in the specs.
There are other differences but all of them seem to stem from the fact that we ignore any tag with the 'draft' attribute.
babel/numbers.py
Outdated
| def format_decimal(number, format=None, locale=LC_NUMERIC): | ||
| def format_decimal( | ||
| number, format=None, locale=LC_NUMERIC, | ||
| decimal_quantization=True): |
There was a problem hiding this comment.
Doc for decimal_quantization is missing. What does it do, exactly ?
| currency_digits=True, format_type='standard'): | ||
| def format_currency( | ||
| number, currency, format=None, locale=LC_NUMERIC, currency_digits=True, | ||
| format_type='standard', decimal_quantization=True): |
There was a problem hiding this comment.
Doc for decimal_quantization is missing here too.
| def format_percent(number, format=None, locale=LC_NUMERIC): | ||
|
|
||
| def format_percent( | ||
| number, format=None, locale=LC_NUMERIC, decimal_quantization=True): |
There was a problem hiding this comment.
Same as above, undocumented parameter of a public API.
|
|
||
| def format_scientific(number, format=None, locale=LC_NUMERIC): | ||
| def format_scientific( | ||
| number, format=None, locale=LC_NUMERIC, decimal_quantization=True): |
There was a problem hiding this comment.
Doc for param missing, same as above
| detected in the pattern. Default is to not mess with the scale at all. | ||
| """ | ||
| scale = 0 | ||
| if '%' in ''.join(self.prefix + self.suffix): |
There was a problem hiding this comment.
The call to join is useless here. A simple concatenation would be enough.
There was a problem hiding this comment.
The call to join is indeed required as prefix and suffix are lists of strings. Removing the join introduce a regression. See: https://travis-ci.org/python-babel/babel/jobs/240689769#L1007
I reverted the change in 4887fdf. All unittests passes now.
| scale = 0 | ||
| if '%' in ''.join(self.prefix + self.suffix): | ||
| scale = 2 | ||
| elif u'‰' in ''.join(self.prefix + self.suffix): |
There was a problem hiding this comment.
The call to join is indeed required as prefix and suffix are lists of strings. Removing the join introduce a regression. See: https://travis-ci.org/python-babel/babel/jobs/240689769#L1007
I reverted the change in 4887fdf. All unittests passes now.
|
Thanks @benselme for the reviewing effort! I'll address all your comments in one or two weeks as I'm on holidays right now! 😎 |
|
I just addressed all coding and document issues. The PR has been squashed into one commit. @benselme : I know this PR is not easy to review. To ease the pain I took the time to write expansive tests, searching for edge-cases and undefined situations. I'm really happy with the results as I found a way to handle generically any rendering pattern. Now the only remaining issue concerns the |
|
Thanks a lot for your work. What's here looks good to me. But I think the I would also love to have @etanol 's opinion on all this since he's the other person who recently did some work on numbers. PS: the CLDR-users mailing list is a very useful resource. |
|
Sorry to respond so late, but I haven't been involved in Python development for a while. The changes look good, although they are a bit difficult to review because refactoring and new functionality are not in separate commits. If you have the time, please consider splitting this PR in to two commits, otherwise it's very confusing to tell apart code movements from actual new functionality or bugfixing. I also have a couple of other comments for specific snippets in case you want to address them. Overall, I'm okay with the feature as long as it doesn't break backwards compatibility. |
babel/numbers.py
Outdated
| def apply(self, value, locale, currency=None, force_frac=None): | ||
| frac_prec = force_frac or self.frac_prec | ||
| @property | ||
| def scale(self): |
There was a problem hiding this comment.
Turning a precomputed field into a property is going to have a small performance impact. My first motivation to contribute to this module was a use case we had, where currency formatting was consuming a lot of CPU time. This is not Java, so we don't have a JIT that can inline this method call.
If you have time, try to do some trivial benchmarking by comparing against the parent commit. Think about formatting hundreds of thousands of values.
There was a problem hiding this comment.
I do not want to add other module dependencies so I abandoned the idea of reusing a @cached_property decorator.
Instead I simply kept the original method to keep the code readable and documented. See how the scaling value is now computed once since commit 7de4329.
babel/numbers.py
Outdated
| Forced decimal quantization is active by default so we'll produce a | ||
| number string that is strictly following CLDR pattern definitions. | ||
| """ | ||
| # Manipulates decimal instances only. |
There was a problem hiding this comment.
Most of the one-line comments in this method are not very useful. I understand and appreciate the aesthetics, but in practice they don't provide any value. I think it's better to add small paragraphs near the complex chunks of code, explaining the abstract ideas that lead to them.
There was a problem hiding this comment.
I stand by the principle that you can't have too much documentation, but I don't really want to nit-pick here. Also note that this commenting style was originally there and I simply tried to keep the spirit of the current code base.
Anyway I made the necessary changes to remove one-line comments in commit 22b8f81.
babel/numbers.py
Outdated
| if not decimal_quantization: | ||
| # Bump decimal precision to the natural precision of the number if | ||
| # it exceeds the one we're about to use. | ||
| frac_prec = (frac_prec[0], max([frac_prec[1], self.precision(value)])) |
There was a problem hiding this comment.
So this is the core of the PR, right? As I mentioned in the discussion, it would be much better to separate this (and all dependant changes) to a separate commit. This will allow reviewers to choose granularity (since the GitHub UI allows to review all commits combined), and will ease potential bisections in the future.
There was a problem hiding this comment.
It is the core of the PR which is about bypassing decimal quantization. And I'll be happy to split it into another PR.
Because the code is currently highly intertwined, I'd rather complete the review process of the whole monolithic PR in its present form before attempting a code split.
There was a problem hiding this comment.
I think we are ready to see that split now. And I'm going to insist more on separating refactoring from the real decimal quantization optional behavior code. It will help potential future bisections.
babel/numbers.py
Outdated
| def apply(self, value, locale, currency=None, force_frac=None): | ||
| frac_prec = force_frac or self.frac_prec | ||
| @property | ||
| def scale(self): |
babel/numbers.py
Outdated
| return scale | ||
|
|
||
| @staticmethod | ||
| def precision(number): |
There was a problem hiding this comment.
I think method names should contain a verb – perhaps this should be get_precision()?
Also, instead of a static method, consider making it a free function?
There was a problem hiding this comment.
I renamed the method and made it a free function in commit 8c70916.
babel/numbers.py
Outdated
| return abs(decimal_tuple.exponent) | ||
|
|
||
| @staticmethod | ||
| def quantum(precision): |
There was a problem hiding this comment.
The comments for the precision method apply here too :)
tests/test_numbers.py
Outdated
| from datetime import date | ||
|
|
||
| from babel import numbers | ||
| from babel.numbers import ( |
There was a problem hiding this comment.
Isn't this a repeat of line 23, below?
tests/test_numbers.py
Outdated
|
|
||
| def test_format_decimal_quantization(): | ||
| # Test precision conservation. | ||
| test_data = [ |
There was a problem hiding this comment.
You can use @pytest.mark.parametrize here:
@pytest.mark.parametrize('input_value, expected', [
('10000', '10,000'),
# (etc, etc.)
])
def test_format_decimal_quantization(input_value, expected):
# ...This has the advantage that each of these cases will show up (and be addressable) as a separate test case, so if one of those fails, the reader will not have to read the locals to figure out which particular case failed. :)
tests/test_numbers.py
Outdated
| == u'1.099,98') | ||
|
|
||
|
|
||
| def test_format_currency_quantization(): |
There was a problem hiding this comment.
Same parametrize comment applies here.
tests/test_numbers.py
Outdated
|
|
||
| def test_scientific_exponent_displayed_as_integer(): | ||
| assert numbers.format_scientific(100000, locale='en_US') == u'1E5' | ||
| def test_format_percent_quantization(): |
tests/test_numbers.py
Outdated
| assert numbers.format_scientific(42, u'00000.000000E0000', locale='en_US') == u'42000.000000E-0003' | ||
|
|
||
|
|
||
| def test_format_scientific_quantization(): |
There was a problem hiding this comment.
And this oughta get parametrized too :D
| value = abs(value).normalize() | ||
|
|
||
| # Prepare scientific notation metadata. | ||
| if self.exp_prec: |
There was a problem hiding this comment.
There's big chunks of scientific notation code interspersed with more "regular" rendering here – would you think the exp_prec bits could maybe be refactored into separate functions?
| # Exponent grouping | ||
| fmt = numbers.format_scientific(12345, '##0.####E0', locale='en_US') | ||
| self.assertEqual(fmt, '12.345E3') | ||
| self.assertEqual(fmt, '1.2345E4') |
There was a problem hiding this comment.
So was this a bug, or a deliberate change in output?
There was a problem hiding this comment.
That is a deliberate change in output to fix what I think is a bug. The ##0.####E0 pattern implies only the first digit before the decimal separator is required.
|
I just finished addressing all pending comments of this PR. What needs to be done now: |
babel/numbers.py
Outdated
| # triggered if the decimal quantization is disabled or if a scientific | ||
| # notation pattern has a missing mandatory fractional part (as in the | ||
| # default '#E0' pattern). This special case has been extensively | ||
| # disccused at https://github.com/python-babel/babel/pull/494#issuecomment-307649969 . |
There was a problem hiding this comment.
Small typo here discussed not disccused
|
Thanks ! I'll try and have a look at your changes this week. |
|
Sorry, weeks go by and I've been quite busy. Not forgetting about this PR though. Thanks for your patience. |
|
Thanks @benselme for the feedback! I know this PR is kind of tough so take your time. |
|
Any news on that one? :) |
|
I just finished splitting this PR. #538 is now waiting to be merged so I can finally rebase that one to add proper decimal quantization control. |
|
This PR has been rebased on top of the latest |
|
I think three years is enough proof of patience and perseverance. Thanks @kdeldycke for having both. |
|
Thanks @etanol for the merge! 😃 |
This PR add a
decimal_quantizationparameter toformat_decimal(),format_currency(),format_percent()andformat_scientific(), to bypass the forced quantization on the fractional part. This PR definitely address #90 and is a continuation of #410.This PR has been extensively reviewed and is waiting for #538 to be merged, so I can rebase that one on top of it.